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DETAILED ACTION 

1 . This is responsive to the Application filed on 14 January 2004. 

Drawings 

2. The drawings are objected to because in FIG. 2, it is believed there should be 
two arrows pointing from item 216, one representing the affirmative while the other 
representing the negative to the question "Is N-gram in Language Database". 
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to 
the Office action to avoid abandonment of the application. Any amended replacement 
drawing sheet should include all of the figures appearing on the immediate prior version 
of the sheet, even if only one figure is being amended. The figure or figure number of an 
amended drawing should not be labeled as "amended." If a drawing figure is to be 
canceled, the appropriate figure must be removed from the replacement sheet, and 
where necessary, the remaining figures must be renumbered and appropriate changes 
made to the brief description of the several views of the drawings for consistency. 
Additional replacement sheets may be necessary to show the renumbering of the 
remaining figures. Each drawing sheet submitted after the filing date of an application 
must be labeled in the top margin as either "Replacement Sheet" or "New Sheet" 
pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the 
applicant will be notified and informed of any required corrective action in the next Office 
action. The objection to the drawings will not be held in abeyance. 
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Claim Objections 

3. Claim 2 is objected to under 37 CFR 1.75(c), as being of improper dependent 
form for failing to further limit the subject matter of a previous claim. Applicant is 
required to cancel the claim(s), or amend the claim(s) to place the claim(s) in proper 
dependent form, or rewrite the claim(s) in independent form. Each limitation in claim 2 is 
already disclosed in claim 1 on which claim 2 depends. 

4. Claim 9 is objected to because it is believed "method of determining" at the 
beginning of the preamble should be 'A method of determining'. 



Claim Rejections - 35 USC §112 

5. The following is a quotation of the second paragraph of 35 U.S.C. 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

6. Claims 1 - 6 and 9-11 are rejected under 35 U.S.C. 112, second paragraph, 
as being indefinite for failing to particularly point out and distinctly claim the subject 
matter which applicant regards as the invention. 

Claim 1 recites the limitation "said n-grams" in (c). It is unclear as to which n- 
gram, the textual passage n-grams or the databases n-grams, the limitation refers. The 
Examiner will read the limitation as referring to the textual passage n-grams. Claims 2 - 
6 also recite the limitation "said n-gram" and raise the same issue as claim 1 on which 
they all depend. 
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Claim 9 recites the limitation "said short words" in (c). It is unclear as to which 
short words, the textual passage short words or the databases short words, the 
limitation refers. The Examiner will read the limitation as referring to the textual passage 
short words. 

Claim 10 recites the limitations "said n-grams and said short words" in (c). It is 
unclear as to which n-gram and said short words, the textual passage n-grams and said 
short words or the databases n-grams and said short words, the limitations refer. The 
Examiner will read the limitations as referring to the textual passage n-grams and said 
short words. 

Claim 1 1 recites the limitation "said n-grams" in (d). It is unclear as to which n- 
gram, the textual passage n-grams or the databases n-grams, the limitations refers. The 
Examiner will read the limitation as referring to the textual passage n-grams. 

Claim Rejections • 35 USC § 103 

7. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

8. Claims 1 - 16 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Schulze (USPN 6,167,369) in view of Messerly et al (USPN 6,076,051). 

Claim 1: 
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Schulze discloses a method of determining the language of a textual passage, 
the method comprising the steps of: 

(a) parsing said textual passage into a plurality of n-grams ("tokenizes large 
samples of text ", col. 1, lines 18-34); 

(b) comparing each of said n-grams with a plurality of databases, wherein each 
of said databases comprises a list of n-grams associated with a specific language ("The 
probabilities are then used to guess the language of a sentence", col. 1, lines 18-34); 

(c) determining an initial weight for each of said n-grams, per language, by 
calculating the frequency with which each of said n-grams appears in each of said 
databases and dividing said frequency by the total number of n-grams in said respective 
database ("probability of a retained trigram is approximated by summing the frequency 
of all retained trigrams for the language and dividing the trigram's frequency by the sum 
of frequencies", col. 1, lines 18-34). 

However, Schulze does not explicitly disclose determining the number of 
databases containing each n-gram and dividing the n-grams initial weight by said 
number. 

In a similar information retrieval method, Messerly discloses 

(d) determining the number of said databases within which each of said n-grams 
(token combination) appear ("inverse document frequency", col. 12, equation 1); 

(e) altering said initial weight for each of said n-grams by multiplying said initial 
weight with the inverse of said number of databases within which each of said n-grams 
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appear ("The facility preferably uses a combination of inverse document frequency ... to 
rank the matching target documents", col. 12, lines 36-39). 

It would have been obvious to one with ordinary skill in the art at the time of the 
invention to alter the n-grams initial weight in Schulze's method using inverse document 
frequency because that would give "greater weight to a token combination appearing in 
fewer of the targets documents" (Messerly, col. 12, lines 40-43). 

Schulze further discloses 

(f) producing the weight of each language over the text passage by calculating, 
per language, the sum over each n-gram in the text passage of the products of the 
number of times that that n-gram appears in the text passage and the language-specific 
altered weight calculated in step (e) for that n-gram ("dividing the sentence into trigrams 
and calculating the probability of the sequence of trigrams for each language", col. 1, 
lines 18-34); 

(g) sorting the list of per language passage weights from step (f) in decreasing 
order, returning the most likely language for the text passage as the first element 
(highest weight) in the list ("The language with the highest probability for the sequence 
of trigrams is chosen", col. 1, lines 18-34). 

Claim 2: 

Schulze and Messerly disclose the method of claim 1 , Schulze further discloses 
wherein the step of determining an initial weight for each of said n-grams, per language, 
comprises the steps of calculating the frequency with which each of said n-grams 
appears in each of said databases and dividing said frequency by the total number of n- 
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grams in said respective database ("probability of a retained trigram is approximated by 
summing the frequency of all retained trigrams for the language and dividing the 
trigram's frequency by the sum of frequencies", col. 1, lines 18-34). 
Claim 3: 

Schulze and Messerly disclose the method of claim 1 , Schulze further discloses 
wherein said n-grams have a size selected from the group consisting of bi-grams, tri- 
grams, and quad-grams (col. 16, lines 16-21). 

Claim 4: 

Schulze and Messerly disclose the method of claim 1, Schulze further discloses 
wherein said n-grams are anchored n-grams (col. 16, lines 16-21). 
Claim 5: 

Schulze and Messerly disclose the method of claim 1 , Schulze further discloses 
wherein said n-grams are replacement-type n-grams (col. 16, lines 16-21). 
Claim 6: 

Schulze and Messerly disclose the method of claim 1 , Schulze further discloses 
wherein said n-grams are any combination of n-grams, including anchored n-grams 
and/or replacement-type n-grams, and/or n-grams of different lengths (col. 16, lines 16- 
21). 

Claim 7: 

Schulze and Messerly disclose the method of claim 1, Schulze further discloses 
wherein said textual passage comprises 20 or more words (col. 6, lines 62-65). 
Claim 8: 
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Schulze and Messerly disclose the method of claim 1, Schulze further discloses 
wherein said textual passage comprises 40 or more words (col. 6, lines 62-65). 
Claim 9: 

Schulze and Messerly disclose the method of claim 1 , Schulze further discloses 
wherein the n-grams are words (col. 1, lines 34-46). 
Claim 10: 

Schulze and Messerly disclose the method of claim 1 , Schulze further discloses 
wherein the n-grams are a combination of n-grams and words (col. 2, lines 55-64). 
Claim 11: 

Claim 1 1 is similar in scope and content to claim 1 and is rejected with the same 
rationale. 

Claim 12: 

Schulze and Messerly disclose the system of claim 1 1 , Schulze further discloses 
a scanner and an optical character recognition device, wherein said scanner and said 
optical character recognition device are connected to said central processing unit, 
wherein said program receives a textual passage from a document scanned by said 
scanner (col. 17, line 65 to col. 18, line 10). 

Claim 13: 

Schulze and Messerly disclose the system of claim 1 1 , Schulze further discloses 
wherein said program comprises a user interface that allows a user to enter said textual 
passage (col. 18, lines 11-12). 

Claim 14: 
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Schulze and Messerly disclose the system of claim 13, Schulze further discloses 
wherein said user interface is a graphical user interface (col. 18, lines 11-15). 
Claim 15: 

Schulze and Messerly disclose the system of claim 13, Schulze further discloses 
wherein said user interface displays the identified language (col. 18, lines 11-15). 
Claim 16: 

Schulze and Messerly disclose the system of claim 11, Schulze further discloses 
wherein said program comprises a user interface that allows a user to enter a Uniform 
Resource Locator that contains said textual passage (col. 17, lines 51-64). 

Conclusion 

9. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

a. Damashek (USPN 5,418,951) discloses a method of identifying, retrieving, 
or sorting documents by language or topic involving the steps of creating an n- 
gram array for each document in a database, and parsing an unidentified 
document or query into n-grams. 

b. de Campos (USPN 6,272,456) discloses method for identifying a 
language of a document from a small sample input of the document by using n- 
gram profiles. 

c. Cavnar et al ("N-Gram Based Text Categorization", Proceedings of 
SDAIR-94, 3rd Annual Symposium on Document Analysis and Information 
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Retrieval, 1994) discloses a method for identifying the language of a text using n- 

gram based text categorization. 
1 0. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Samuel G. Neway whose telephone number is 571-270- 
1058. The examiner can normally be reached on Monday - Friday 8:30AM - 5:30PM 



If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David R Hudspeth can be reached on 571-272-7843. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 
273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 



EST. 





DAVID HUDSPETH 
SUPERVISORY PATENT EXAMINER 
TECHNOLOGY CENTW W 



